Two-Level Incremental Checkpoint Recovery Scheme for Reducing System Total Overheads

نویسندگان

  • Huixian Li
  • Liaojun Pang
  • Zhangquan Wang
چکیده

Long-running applications are often subject to failures. Once failures occur, it will lead to unacceptable system overheads. The checkpoint technology is used to reduce the losses in the event of a failure. For the two-level checkpoint recovery scheme used in the long-running tasks, it is unavoidable for the system to periodically transfer huge memory context to a remote stable storage. Therefore, the overheads of setting checkpoints and the re-computing time become a critical issue which directly impacts the system total overheads. Motivated by these concerns, this paper presents a new model by introducing i-checkpoints into the existing two-level checkpoint recovery scheme to deal with the more probable failures with the smaller cost and the faster speed. The proposed scheme is independent of the specific failure distribution type and can be applied to different failure distribution types. We respectively make analyses between the two-level incremental and two-level checkpoint recovery schemes with the Weibull distribution and exponential distribution, both of which fit with the actual failure distribution best. The comparison results show that the total overheads of setting checkpoints, the total re-computing time and the system total overheads in the two-level incremental checkpoint recovery scheme are all significantly smaller than those in the two-level checkpoint recovery scheme. At last, limitations of our study are discussed, and at the same time, open questions and possible future work are given.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Enhanced MSS-based checkpointing Scheme for Mobile Computing Environment

Mobile computing systems are made up of different components among which Mobile Support Stations (MSSs) play a key role. This paper proposes an efficient MSS-based non-blocking coordinated checkpointing scheme for mobile computing environment. In the scheme suggested nearly all aspects of checkpointing and their related overheads are forwarded to the MSSs and as a result the workload of Mobile ...

متن کامل

Coordinated Checkpointing Without Direct Coordination

Coordinated checkpointing is a well-known method to achieve fault tolerance in distributed systems. Longrunning parallel applications and high-availability applications are two potential users of checkpointing, although with different requirements. Parallel applications need low failure-free overheads, and high-availability applications require fast and bounded recoveries. In this paper, we des...

متن کامل

A Two-level Application Transparent Checkpointing Scheme in Cloud Computing Environment

Cloud computing has been widely applied to a wide variety of computing environments, with the traditional distributed computing environment, cloud computing using virtual machines to achieve dynamic resource partitioning. Checkpoint recovery technology is a low-cost method to improve the system availability. This paper analyzes the characteristics of cloud computing, virtualization technology a...

متن کامل

Another Two - Level Failure Recovery Scheme : Performance

This report deals with the design and evaluation of a \two-level" failure recovery scheme for distributed systems. In our previous work 30, 32], we motivated a \two-level" recovery approach that tolerates the more probable failures with a low overhead, and less probable failures with possibly higher overhead. The two-level approach can achieve a smaller overhead as compared to traditional recov...

متن کامل

A Case for Multi-Level Distributed Recovery Schemes

Most of the distributed recovery schemes proposed in the literature are designed to tolerate arbitrary number of failures, with a few notable exceptions of schemes designed to tolerate single failures. In this report, we demonstrate that, it is often advantageous to use \multi-level" recovery schemes. A \multi-level" recovery scheme is one that can tolerate diierent number of faults at diierent...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2014